Stata：coefplot 如何选择交互项系数估计值进行绘图？

Original RStata RStata 2023-10-24

收录于合集 #Stata 绘图案例合集（二） 9个

为了让大家更好的理解本文的内容，欢迎各位培训班会员参加明晚 8 点的直播课 「Stata：coefplot 如何选择交互项系数估计值进行绘图」

最近有小伙伴问了这样的一个问题：

也就是使用 coefplot 展示系数估计值与置信区间时如何选择交互项变量。

方法一：查看所有可用的变量名称

这里我们使用 auto 数据集为例：

sysuse auto, clear
*> (1978 automobile data)

在回归中使用因子交乘算子：

reg price weight rep78##foreign 

*> note: 1b.rep78#1.foreign identifies no observations in the sample.
*> note: 2.rep78#1.foreign identifies no observations in the sample.
*> note: 5.rep78#1.foreign omitted because of collinearity.
*> 
*>       Source |       SS           df       MS      Number of obs   =        69
*> -------------+----------------------------------   F(8, 60)        =      9.05
*>        Model |   315359465         8  39419933.1   Prob > F        =    0.0000
*>     Residual |   261437494        60  4357291.57   R-squared       =    0.5467
*> -------------+----------------------------------   Adj R-squared   =    0.4863
*>        Total |   576796959        68  8482308.22   Root MSE        =    2087.4
*> 
*> -------------------------------------------------------------------------------
*>         price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
*> --------------+----------------------------------------------------------------
*>        weight |   3.811384   .4666457     8.17   0.000     2.877954    4.744815
*>               |
*>         rep78 |
*>            2  |   435.9862   1654.487     0.26   0.793    -2873.481    3745.454
*>            3  |   738.2337   1538.028     0.48   0.633     -2338.28    3814.748
*>            4  |  -330.3094   1644.223    -0.20   0.841    -3619.246    2958.627
*>            5  |   3984.978   2154.133     1.85   0.069      -323.93    8293.886
*>               |
*>       foreign |
*>      Foreign  |    398.453   1644.867     0.24   0.809    -2891.772    3688.677
*>               |
*> rep78#foreign |
*>    1#Foreign  |          0  (empty)
*>    2#Foreign  |          0  (empty)
*>    3#Foreign  |   3281.889   2245.576     1.46   0.149    -1209.932    7773.709
*>    4#Foreign  |   5029.403   2076.434     2.42   0.018     875.9157     9182.89
*>    5#Foreign  |          0  (omitted)
*>               |
*>         _cons |  -7250.791   2066.713    -3.51   0.001    -11384.83    -3116.75
*> -------------------------------------------------------------------------------

直接使用 coefplot 就可以绘制所有变量的：

coefplot

那么该如何仅仅选择交互项的系数绘制呢？

回归的结果也会被存储在返回值中，查看 e-class 返回值：

eret list 

*> scalars:
*>                   e(N) =  69
*>                e(df_m) =  8
*>                e(df_r) =  60
*>                   e(F) =  9.046888971642481
*>                  e(r2) =  .5467425923475249
*>                e(rmse) =  2087.412650144713
*>                 e(mss) =  315359464.5505149
*>                 e(rss) =  261437494.3190505
*>                e(r2_a) =  .4863082713271949
*>                  e(ll) =  -620.4989339800943
*>                e(ll_0) =  -647.7986144493904
*>                e(rank) =  9
*> 
*> macros:
*>             e(cmdline) : "regress price weight rep78##foreign"
*>               e(title) : "Linear regression"
*>           e(marginsok) : "XB default"
*>                 e(vce) : "ols"
*>              e(depvar) : "price"
*>                 e(cmd) : "regress"
*>          e(properties) : "b V"
*>             e(predict) : "regres_p"
*>               e(model) : "ols"
*>           e(estat_cmd) : "regress_estat"
*> 
*> matrices:
*>                   e(b) :  1 x 19
*>                   e(V) :  19 x 19
*>                e(beta) :  1 x 18
*> 
*> functions:
*>              e(sample)

回归系数都会被存储在 e(b) 中：

*> e(b)[1,19]
*>                         1b.          2.          3.          4.          5.         0b.          1.   1b.rep78#
*>         weight       rep78       rep78       rep78       rep78       rep78     foreign     foreign  0b.foreign
*> y1   3.8113843           0   435.98624   738.23368  -330.30942   3984.9781           0   398.45297           0
*> 
*>       1b.rep78#   2o.rep78#   2o.rep78#   3o.rep78#    3.rep78#   4o.rep78#    4.rep78#   5o.rep78#   5o.rep78#
*>     1o.foreign  0b.foreign  1o.foreign  0b.foreign   1.foreign  0b.foreign   1.foreign  0b.foreign  1o.foreign
*> y1           0           0           0           0   3281.8889           0   5029.4026           0           0
*> 
*>                
         *> _cons
*> y1  -7250.7912

提取这个矩阵的列名：

matrix A = e(b)
local namecol: colnames A

*- 打印出来：
foreach i in `namecol' {
 di "`i'"
}

*> weight
*> 1b.rep78
*> 2.rep78
*> 3.rep78
*> 4.rep78
*> 5.rep78
*> 0b.foreign
*> 1.foreign
*> 1b.rep78#0b.foreign
*> 1b.rep78#1o.foreign
*> 2o.rep78#0b.foreign
*> 2o.rep78#1o.foreign
*> 3o.rep78#0b.foreign
*> 3.rep78#1.foreign
*> 4o.rep78#0b.foreign
*> 4.rep78#1.foreign
*> 5o.rep78#0b.foreign
*> 5o.rep78#1o.foreign
*> _cons

这些列名都是可以在 coefplot 的 keep() 选项中使用的：

coefplot, keep(3.rep78#1.foreign 4.rep78#1.foreign)

需要注意的是，e(b) 中值为 0 的不能用，这些是被 omit 的变量：

coefplot, keep(5o.rep78#1o.foreign)

*> (.: no coefficients found, all dropped, or none kept)
*> (nothing to plot)

也可以使用 foreach 循环提取 e(b) 中值不为 0 、含有 # 的进行绘图：

local namecol "`: colnames A'"
di "`namecol'"

*> weight 1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78 0b.foreign 1.foreign 1b.rep78#0b.foreign 1b.rep78#1o.foreign 2o.re
*> > p78#0b.foreign 2o.rep78#1o.foreign 3o.rep78#0b.foreign 3.rep78#1.foreign 4o.rep78#0b.foreign 4.rep78#1.foreign 5
*> > o.rep78#0b.foreign 5o.rep78#1o.foreign _cons

local varlist = ""
foreach i in `namecol' {
    local v = A[1, "`i'"]
    if `v' != 0 & index("`i'" , "#") {
        local varlist = "`varlist' `i'"
    }
}

di "`varlist'" 
*> 3.rep78#1.foreign 4.rep78#1.foreign
coefplot, keep(`varlist')

方法二：使用通配符

还可以使用通配符：

reg price weight rep78##foreign
coefplot, keep(*rep78*foreign*)

另外提取回归系数的时候也可以遵循这个方法流程：

sysuse auto, clear 
reg price i.rep78 

*>       Source |       SS           df       MS      Number of obs   =        69
*> -------------+----------------------------------   F(4, 64)        =      0.24
*>        Model |  8360542.63         4  2090135.66   Prob > F        =    0.9174
*>     Residual |   568436416        64     8881819   R-squared       =    0.0145
*> -------------+----------------------------------   Adj R-squared   =   -0.0471
*>        Total |   576796959        68  8482308.22   Root MSE        =    2980.2
*> 
*> ------------------------------------------------------------------------------
*>        price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
*> -------------+----------------------------------------------------------------
*>        rep78 |
*>           2  |   1403.125   2356.085     0.60   0.554    -3303.696    6109.946
*>           3  |   1864.733   2176.458     0.86   0.395    -2483.242    6212.708
*>           4  |       1507   2221.338     0.68   0.500    -2930.633    5944.633
*>           5  |     1348.5   2290.927     0.59   0.558    -3228.153    5925.153
*>              |
*>        _cons |     4564.5   2107.347     2.17   0.034     354.5913    8774.409
*> ------------------------------------------------------------------------------

*- 查看返回值
eret list 

*- 回归系数都会存储在 e(b) 中
mat list e(b)

*- 提取 e(b) 的列名
mat A = e(b)
mat list A 
local namecol: colnames A
foreach i in `namecol' {
 di "`i'"
}

*> 1b.rep78
*> 2.rep78
*> 3.rep78
*> 4.rep78
*> 5.rep78
*> _cons

所以虚拟变量的系数可以这样提取：

di _b[1b.rep78]
0 
di _b[2.rep78]
1403.125

方法三：从返回值中提取数据使用 twoway 直接绘图

对该方法感兴趣的小伙伴也可以学习下这个课程：
「Stata：如何展示回归系数和置信区间」（https://rstata.duanshu.com/#/brief/course/2d3950fe01ed46119619b615dc8a90e4）

coefplot 虽然方便，但是有时候可能不能完全满足需要，所以也可以直接使用 twoway 绘制：

sysuse auto, clear 
reg price weight rep78##foreign

由于 eret list 中的 e(b) 中只存储了回归系数，不足以直接绘图，所以下面我们还是使用 ret list 中的 r(table) 数据绘图：

ret list
mat m = r(table)
mat list m 

*> m[9,19]
*>                             1b.          2.          3.          4.          5.         0b.
*>             weight       rep78       rep78       rep78       rep78       rep78     foreign
*>      b   3.8113843           0   435.98624   738.23368  -330.30942   3984.9781           0
*>     se   .46664575           .   1654.4874   1538.0279   1644.2232   2154.1333           .
*>      t   8.1676181           .   .26351742   .47998717  -.20089086   1.8499218           .
*> pvalue   2.538e-11           .   .79305471   .63298176   .84146336   .06925106           .
*>     ll   2.8779538           .  -2873.4813  -2338.2803  -3619.2456  -323.93005           .
*>     ul   4.7448147           .   3745.4537   3814.7476   2958.6267   8293.8862           .
*>     df          60          60          60          60          60          60          60
*>   crit   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978
*>  eform           0           0           0           0           0           0           0
*> 
*>                  1.   1b.rep78#   1b.rep78#   2o.rep78#   2o.rep78#   3o.rep78#    3.rep78#
*>            foreign  0b.foreign  1o.foreign  0b.foreign  1o.foreign  0b.foreign   1.foreign
*>      b   398.45297           0           0           0           0           0   3281.8889
*>     se   1644.8673           .           .           .           .           .   2245.5759
*>      t   .24224019           .           .           .           .           .    1.461491
*> pvalue   .80942015           .           .           .           .           .    .1490985
*>     ll  -2891.7715           .           .           .           .           .  -1209.9317
*>     ul   3688.6775           .           .           .           .           .   7773.7094
*>     df          60          60          60          60          60          60          60
*>   crit   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978
*>  eform           0           0           0           0           0           0           0
*> 
*>           4o.rep78#    4.rep78#   5o.rep78#   5o.rep78#            
*>         0b.foreign   1.foreign  0b.foreign  1o.foreign       _cons
*>      b           0   5029.4026           0           0  -7250.7912
*>     se           .   2076.4342           .           .    2066.713
*>      t           .   2.4221343           .           .  -3.5083687
*> pvalue           .   .01846747           .           .   .00086119
*>     ll           .   875.91573           .           .  -11384.833
*>     ul           .   9182.8895           .           .  -3116.7497
*>     df          60          60          60          60          60
*>   crit   2.0002978   2.0002978   2.0002978   2.0002978   2.0002978
*>  eform           0           0           0           0           0
*>

r(table) 的列名实际上也是和 e(b) 一样的。

然后我们再把这个数据转换为 dta 数据：

local namecol: colnames m 
local n: word count `namecol' 
di "`n'"
*> 19 

clear 
set obs `n'
gen v = ""
forval i = 1/`n' {
    local vi: word `i' of `namecol' 
    replace v = "`vi'" in `i'
}

gen b = .
gen ll = .
gen ul = .

forval i = 1/`n'{
    replace b = m[1, `i'] in `i'
    replace ll = m[5, `i'] in `i'
    replace ul = m[6, `i'] in `i'
}

*- 删除被 omit 的变量
drop if b == 0 

*- 有序因子化
gen id = _n 
sencode v, gen(varn) gsort(-id)

list 

*>     +-------------------------------------------------------------------------------+
*>     |                 v           b          ll         ul   id                varn |
*>     |-------------------------------------------------------------------------------|
*>  1. |            weight    3.811384    2.877954   4.744815    1              weight |
*>  2. |           2.rep78    435.9862   -2873.481   3745.454    2             2.rep78 |
*>  3. |           3.rep78    738.2337    -2338.28   3814.748    3             3.rep78 |
*>  4. |           4.rep78   -330.3094   -3619.246   2958.627    4             4.rep78 |
*>  5. |           5.rep78    3984.978   -323.9301   8293.886    5             5.rep78 |
*>     |-------------------------------------------------------------------------------|
*>  6. |         1.foreign     398.453   -2891.771   3688.677    6           1.foreign |
*>  7. | 3.rep78#1.foreign    3281.889   -1209.932   7773.709    7   3.rep78#1.foreign |
*>  8. | 4.rep78#1.foreign    5029.403    875.9157    9182.89    8   4.rep78#1.foreign |
*>  9. |             _cons   -7250.791   -11384.83   -3116.75    9               _cons |
*>     +-------------------------------------------------------------------------------+

然后就可以使用 twoway 绘图了：

tw rspike ll ul varn, horizontal || ///
    sc varn b, leg(off) m(o) ///
    yti("变量") xti("系数估计值 + 95% 置信区间") ///
    xline(0) yla(1(1)9, value)

如果想给 y 轴添加标签的话，可以这样：

tw rspike ll ul varn, horizontal || ///
    sc varn b, leg(off) m(o) ///
    yti("变量") xti("系数估计值 + 95% 置信区间") ///
    xline(0) yla(9 "Weight (lbs.)" 8 "Repair record 1978 = 2" ///
        7 "Repair record 1978 = 3" ///
        6 "Repair record 1978 = 4" ///
        5 "Repair record 1978 = 5" ///
        4 "Foreign" ///
        3 "Repair record 1978 = 3 # Foreign" ///
        2 "Repair record 1978 = 4 # Foreign" ///
        1 "_cons", value)

矩阵转换成 dta 也可以直接使用 svmat：

clear 
svmat m 
xpose, clear 
keep v1 v5 v6 
ren v1 b 
ren v5 ll 
ren v6 ul 

*- 编写一个程序可以把列表转换成变量的（也可以保存成 list2dta.ado 文件方便以后使用）
cap prog drop list2dta
prog def list2dta
    syntax anything, name(string) 
    local n: word count `anything' 
    if `n' > `=_N' set obs `n'
    gen `name' = ""
    forval i = 1/`n' {
        local vi: word `i' of `anything' 
        replace v = "`vi'" in `i'
    }
end

*- 然后就可以使用了 
local namecol: colnames m 
list2dta `namecol', name(v) 
drop if b == 0
list 

*>     +------------------------------------------------------+
*>     |         b          ll         ul                   v |
*>     |------------------------------------------------------|
*>  1. |  3.811384    2.877954   4.744815              weight |
*>  2. |  435.9862   -2873.481   3745.454             2.rep78 |
*>  3. |  738.2337    -2338.28   3814.748             3.rep78 |
*>  4. | -330.3094   -3619.246   2958.627             4.rep78 |
*>  5. |  3984.978   -323.9301   8293.886             5.rep78 |
*>     |------------------------------------------------------|
*>  6. |   398.453   -2891.771   3688.677           1.foreign |
*>  7. |  3281.889   -1209.932   7773.709   3.rep78#1.foreign |
*>  8. |  5029.403    875.9157    9182.89   4.rep78#1.foreign |
*>  9. | -7250.791   -11384.83   -3116.75               _cons |
*>     +------------------------------------------------------+

然后后面的操作就一样了。

方法四：使用 serset 提取图表数据修改后再重新绘制

对于这部分内容感兴趣的小伙伴还可以前往系列课程「Stata 编程导论」的第 15 课时：Do file 编程实战（六）学习，里面提供了更多的案例。
系列课程「Stata 编程导论」：https://rstata.duanshu.com/#/course/500094e7213744bc904e4e2cb270d135 Do file 编程实战（六）：https://rstata.duanshu.com/#/course/27a3b7a55a2545a0a6281ae8fafaf15d

使用 coefplot 绘图之后我们可以使用 graph save 把图表保存为 gph 文件：

sysuse auto, clear 
reg price weight rep78##foreign
coefplot 
gr save "coefplot.gph", replace

重新调用：

clear 
gr use coefplot.gph

为了下面使用方便，编写一个 setnames 命令：

*- 编写一个 setnames 命令（可以保存为 setnames.ado 文件）
capture program drop setnames
program define setnames
    syntax anything
    local varname
    foreach i of varlist _all {
        local varname = "`varname' `i'"
    }
    ren (`varname') (`anything')
end

gph 格式的图文件会保存绘制其的代码和数据，数据可以使用 serset 调用出来：

*- 查看绘图使用的数据（注意这个代码需要重 command 窗口运行，否则不会起效果）
serset dir 

*> 0.  9 observations on 4 variables
*>     __00000Q __00000R __00000F __00000I

serset 0
serset use, clear

查看绘图代码：

gr desc coefplot 

*>     name:  coefplot
*>   format:  live
*>  created:  1 Aug 2023 18:45:50
*>   scheme:  lightrstata
*>     size:  4 x 6
*> dta file:  /Applications/Stata/ado/base/a/auto.dta dated 3 Jun 2023 12:59
*>  command:  twoway (rspike __00000Q __00000R __00000F if __00000E==1, pstyle(p1) lwidth(*1)
*>            horizontal) (scatter __00000F __00000I if __00000E==1, pstyle(p1) ), ylabel(1
*>            `"Weight (lbs.)"' 2 `"Repair record 1978=2"' 3 `"Repair record 1978=3"' 4 `"Repair
*>            record 1978=4"' 5 `"Repair record 1978=5"' 6 `"Foreign"' 7 `"Repair record 1978=3
*>            # Foreign"' 8 `"Repair record 1978=4 # Foreign"' 9 `"_cons"', nogrid
*>            angle(horizontal) ) ytick(1 2 3 4 5 6 7 8 9, notick tlstyle(none) grid )
*>            yscale(range(.5 9.5)) yscale(reverse) yti("") xti("") legend(label(2 `"."') all
*>            order(2) off) plotregion(margin(t=0 b=0))

稍微改动好看点：

gen __00000E = 1 
twoway (rspike __00000Q __00000R __00000F if __00000E==1, ///
        pstyle(p1) lwidth(*1) horizontal) ///
    (scatter __00000F __00000I if __00000E==1, pstyle(p1) ), ///
        ylabel(1 `"Weight (lbs.)"' 2 `"Repair record 1978=2"' ///
            3 `"Repair record 1978=3"' 4 `"Repair record 1978=4"' ///
            5 `"Repair record 1978=5"' 6 `"Foreign"' ///
            7 `"Repair record 1978=3 # Foreign"' ///
            8 `"Repair record 1978=4 # Foreign"' ///
            9 `"_cons"', nogrid angle(horizontal) ) ///
        ytick(1 2 3 4 5 6 7 8 9, notick tlstyle(none) grid ) ///
        yscale(range(.5 9.5)) yscale(reverse) yti("") xti("") ///
        legend(label(2 `"."') all order(2) off) ///
        plotregion(margin(t=0 b=0))

可以看到交乘项是 __00000F = 2 或者 __00000F = 3

keep if inlist(__00000F, 2, 3)

然后就可以重新绘制了：

twoway (rspike __00000Q __00000R __00000F if __00000E==1, ///
        pstyle(p1) lwidth(*1) horizontal) /// 
    (scatter __00000F __00000I if __00000E==1, pstyle(p1) ), ///
        ylabel(2 `"Repair record 1978=2"' /// 
            3 `"Repair record 1978=3"', nogrid angle(horizontal) ) ///
        ytick(2 3, notick tlstyle(none) grid ) ///
        yscale(range(1.5 3.5)) yscale(reverse) yti("") xti("") ///
        legend(label(2 `"."') all order(2) off) ///
        plotregion(margin(t=0 b=0))

这样我们就非常彻底的解决了这个问题！

直播信息

为了让大家更好的理解上面的内容，欢迎各位培训班会员参加明晚 8 点的直播课 「Stata：coefplot 如何选择交互项系数估计值进行绘图」

直播地址：腾讯会议(需要报名 RStata 培训班参加)
讲义材料：需要报名 RStata 培训班，详情可阅读：一起来学习 R 语言和 Stata 啦！学习过程中遇到的问题也可以随时提问！

更多关于 RStata 会员的更多信息可添加微信号 r_stata 咨询：

附件下载（点击文末的阅读原文即可跳转）：
https://rstata.duanshu.com/#/brief/course/e877d6b6020447e89530f36639d10c6f

一把短刀，怎么就让他连捅18人？！

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

这一刻，快乐被具象化了

海南省拟任干部人选公

刑满释放人员曝监狱乱象：犯人不仅可以打牌赌博，还能用手机与外界联系

Stata：coefplot 如何选择交互项系数估计值进行绘图？

方法一：查看所有可用的变量名称

方法二：使用通配符

方法三：从返回值中提取数据使用 twoway 直接绘图

方法四：使用 serset 提取图表数据修改后再重新绘制

直播信息

您可能也对以下帖子感兴趣

一把短刀，怎么就让他连捅18人？！

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

这一刻，快乐被具象化了

海南省拟任干部人选公

刑满释放人员曝监狱乱象：犯人不仅可以打牌赌博，还能用手机与外界联系

生成图片，分享到微信朋友圈

Stata：coefplot 如何选择交互项系数估计值进行绘图？

方法一：查看所有可用的变量名称

方法二：使用通配符

方法三：从返回值中提取数据使用 twoway 直接绘图

方法四：使用 serset 提取图表数据修改后再重新绘制

直播信息

您可能也对以下帖子感兴趣